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Abstract 

Many computer vision problems can be formulated as 
binary quadratic programs (BQPs). Two classic relaxation 
methods are widely used for solving BQPs, namely, spec- 
tral methods and semidefinite programming (SDP), each 
with their own advantages and disadvantages. Spectral 
relaxation is simple and easy to implement, but its bound 
is loose. Semidefinite relaxation has a tighter bound, but 
its computational complexity is high for large scale prob- 
lems. We present a new SDP formulation for BQPs, with 
two desirable properties. First, it has a similar relaxation 
bound to conventional SDP formulations. Second, com- 
pared with conventional SDP methods, the new SDP for- 
mulation leads to a significantly more efficient and scalable 
dual optimization approach, which has the same degree of 
complexity as spectral methods. Extensive experiments on 
various applications including clustering, image segmen- 
tation, co-segmentation and registration demonstrate the 
usefulness of our SDP formulation for solving large-scale 
BQPs. 
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1. Introduction 

Many problems in computer vision can be formulated as 
binary quadratic problems, such as image segmentation, im- 
age restoration, graph-matching and problems formulated 
by Markov Random Fields (MRFs). Because general BQPs 
are NP-hard, they are commonly approximated by spectral 
or semidefinite relaxation. 

Spectral methods convert BQPs into eigen-problems. 
Due to their simplicity, spectral methods have been applied 
to a variety of problems in computer vision, such as image 
segmentation |[20l[25l, motion segmentation (TSJ and many 
other MRF applications |T|. However, the bound of spectral 
relaxation is loose and can lead to poor solution quality in 
many cases |[5l[l2l[9l. Furthermore, the spectral formula- 
tion is hard to generalize to accommodate inequality con- 
straints El. 

In contrast, SDP methods produce tighter approxima- 
tions than spectral methods, which have been applied to 
problems including image segmentation |6|, restoration flOl 
[TTl , subgraph matching ifTSl , co-segmentaion |7| and gen- 
eral MRFs |23|. The disadvantage of SDP methods, how- 
ever, is their poor scalability for large-scale problems. The 
worst-case complexity of solving a generic SDP problem 
involving a matrix variable of size n x n and 0{n) linear 
constraints is about 0(n^-^), using interior-point methods. 

In this paper, we present a new SDP formulation for 
BQPs (denoted by SDCut). Our approach achieves higher 
quality solutions than spectral methods while being signif- 
icantly faster than the conventional SDP formulation. Our 
main contributions are as follows. 

(i) A new SDP formulation (SDCut) is proposed to solve 
binary quadratic problems. By virtue of its use of the dual 
formulation, our approach is simplified and can be solved 
efficiently by first order optimization methods, e.g., quasi- 
Newton methods. SDCut has the same level of compu- 
tational complexity as spectral methods, roughly O(n^), 
which is much lower than the conventional SDP formula- 
tion using interior-point method. SDCut also achieves a 
similar bound with the conventional SDP formulation and 
therefore produces better estimates than spectral relaxation. 
(ii) We demonstrate the flexibility of SDCut by applying it 
to a few computer vision applications. The SDCut formu- 



lation allows additional equality or inequality constraints, 
which enable it to have a broader application area than the 
spectral method. 

Related work Our method is motivated by the work of 
Shen et al. 1 19 1, which presented a fast dual SDP approach 
to Mahalanobis metric learning. The Frobenius-norm reg- 
ularization in their objective function plays an important 
role, which leads to a simplified dual formulation. They, 
however, focused on learning a metric for nearest neighbor 
classification. In contrast, here we are interested in discrete 
combinatorial optimization problems arising in computer 
vision. In |8|, the SDP problem was reformulated by the 
non-convex low-rank factorization X = Y Y^ , where Y G 
R"^^^, 772 <C n. This method finds a locally-optimal low- 
rank solution, and runs faster than the interior-point method. 
We compare SDCut with the method in |8|, on image co- 
segmentation. The results show that our method achieves a 
better solution quality and a faster running speed. Olsson et 
al. ifTTl proposed fast SDP methods based on spectral sub- 
gradients and trust region methods. Their methods cannot 
be extended to accommodate inequality constraints, while 
ours is much more general and flexible. Krislock et al. ifTTIl 
have independently formulated a similar SDP for the Max- 
Cut problem, which is simpler than the problems that we 
solve here. Moreover, they focus on globally solving the 
MaxCut problem using branch- and-bound. 

Notation A matrix is denoted by a bold capital letter (X) 
and a column vector is by a bold lower-case letter (x). Sn 
denotes the set of n x n symmetric matrices. X )^ repre- 
sents that the matrix X is positive semidefinite (p.s.d.). For 
two vectors, x < y indicates the element- wise inequality; 
diag(-) denotes the diagonal entries of a matrix. The trace 
of a matrix is denoted as trace(-). The rank of a matrix 
is denoted as rank(-). ||-||i and ||-||2 denote the li and £2 



norm of a vector respectively. ||X|| 



trace(XX' ) 



trace (X^X) is the Frobenius norm. The inner product of 
two matrices is defined as (X, Y) = trace(X^Y). X o Y 
denotes the Hadamard product of X and Y. X Y de- 
notes the Kronecker product of X and Y. I^ indicates the 
n X n identity matrix and e^ denotes an n x 1 vector with 
all ones. Ai(X) and Pi(X) indicate the ith eigenvalue and 
the corresponding eigenvector of the matrix X. We define 
the positive and negative part of X as: 



^+ - Z1a,>0 -^^P^Pi ' ^- - Z]a,<0 ^iPiPi 



X^<0' 



(1) 



and explicitly X = X+ + X_. 

Euclidean projection onto the p.s.d. cone Our method 
relies on the following results (see Sect. 8.1 of LU): 

X+ = argminY^ol|Y-X||^. (2) 

Although ([2]) is an SDP problem, it can be solved efficiently 
by using eigen-decomposition. This is the key observation 
to simplify our SDP formulation. 



2. Spectral and Semidefinite Relaxation 

As a simple example of a binary quadratic problem, we 
consider the following optimization problem: 



min x^Ax, s.t. x G {-1, l}"", 



(3) 



where A G Sn- The integrality constraint makes the BQP 
problem non-convex and NP-hard. 

One of the spectral methods (again by way of example) 
relaxes the constraint xg{ — l,l}^to||x||2 



n: 



minx^Ax, s.t. ||x||2 



(4) 



This problem can be solved by the eigen-decomposition 
of A in 0{n^) time. Although appealingly simple to im- 
plement, the spectral relaxation often yields poor solution 
quality. There is no guarantee on the bound of its solu- 
tion with respect to the optimum of ([3]). The poor bound 
of spectral relaxation has been verified by a variety of au- 
thors |[5l[T2l[9l. Furthermore, it is difficult to generalize the 
spectral method to BQPs with linear or quadratic inequal- 
ity constraints. Although linear equality constraints can be 
considered O, solving ^ under additional inequality con- 
straints is in general NP-hard Il2l . 

Alternatively, BQPs can be relaxed to semidefinite pro- 
grams. Firstly, let us consider an equivalent problem of ([3]): 



min (X, A), s.t. diag(X) 



e,rank(X) = l. (5) 



The original problem is lifted to the space of rank-one p.s.d. 
matrices of the form X = xx^ , The number of variables 
increases from n to n{n + l)/2. Dropping the only non- 
convex rank-one constraint, ([5]) is a convex SDP problem, 
which can be solved conveniently by standard convex op- 
timization toolboxes, e.g., SeDuMi \2V\ and SDPT3 |22|. 
The SDP relaxation is tighter than spectral relaxation (|4]). 
In particular, it has been proved in |4| that the expected 
values of solutions are bounded for the SDP formulation 
of some BQPs (e.g., MaxCut). Another advantage of the 
SDP formulation is the ability of solving problems of more 
general forms, e.g., quadratically constrained quadratic pro- 
gram (QCQP). Quadratic constraints on x are transformed 
to linear constraints on X = xx^. In summary, the con- 
straints for SDP can be either equality or inequality. 
The general form of the SDP problem is expressed as: 

min (X, A), (6a) 

s.t. (X,Bi) =bi, \/i = l,...,p, (6b) 

(X,B,)<6,, Vj=p+l,...,m. (6c) 

The most significant drawback of SDP methods is the poor 
scalability to large problems. Most optimization toolboxes, 
e.g., SeDuMi |21 1 and SDPT3 |22|, use the interior-point 
method for solving SDP problems, which has 0{n^-^) com- 
plexity, making it impractical for large scale problems. 



3. SDCut Formulation 

Before we present the new SDP formulation, we first in- 
troduce a property of the following set: 

n{r]) = {XeSn\^^ 0,trace(X) = r]}. (7) 

The set ft{r]) is known as a spectrahedron, which is the in- 
tersection of a linear subspace (i.e. trace(X) = r]) and the 
p.s.d. cone. 

For the set ^{rj), we have the following theorem, which 
is an extension of the one in ifTSll . 

Theorem 1. (The spherical constraint on a spectrahedron). 
For X G ^(t]), we have the inequality ||X||i? < 77, in which 
the equality holds if and only //'rank(X) = 1. 

Proof. For a matrix X G ^(r/), ||X|||. = trace(XX^) = 
||A(X)||^ < ||A(X)||^ Because X )p 0, then A(X) > 
and ||A(X)||i = trace(X). Therefore 



||X||^ = ||A(X)||2<||(A(X))||i=r^. 



(8) 



Because ||x||2 = ||x||i holds if and only if only one el- 
ement in X is non-zero, the equality holds for ^ if and 
only if there is only one non-zero eigenvalue for X, i.e., 

rank(X) = 1. D 

This theorem shows the rank-one constraint is equivalent 
to II X II i? = 7^ for p.s.d. matrices with a fixed trace. 

The constraint on trace (X) is common in the SDP 
formulation for BQPs. For x G { — 1,1}^, we have 
diag(xx^) = e, and so trace(xx^) = n. Therefore 
||X||i? < ?7 is implicitly involved in the SDP formulation 
of BQPs. 

Then we have a geometrical interpretation of SDP relax- 
ation. The non-convex spherical constraint ||X||i7 = 7^ is 
relaxed to the convex inequality constraint ||X||i? < 7/: 



min(X,A), s.t. ||X|||,-7^^ <p, ([6b]), ([6c]). (10) 



min(X,A), s.t. ||X|||,-7^^ <0, ([6b]), ([6c]). (9) 

Inspired by the spherical constraint, we consider the fol- 
lowing SDP formulations: 

min(X,A)+a(||X||^-7?2), s.t. ([6b]), ([53]). (11) 

where p < and a > are scalar parameters. Given a p, 
one can always find a a, making the problems ( fTO) ) and ( [TT] ) 
equivalent. 

The problem ( [TQ| has the same objective function 
with ([9]), but its search space is a subset of the feasible set 
of (|9]). Hence ( [TQ| finds a sub-optimal solution to ([9]). The 
gap between the solution of ( [TO] ) and ([9]) vanishes when p 
approaches 0. 



On the other hand, because ||X|||. — 7/^ < 0, the objec- 
tive function of ( [TT] ) is not larger than the one of ([9]). When 
a approaches 0, the problem ( [TT] ) is equivalent to ([9]). For 
a small a, the solution of ( [TT] ) approximates the solution 
of ([9]). When a approaches 0, the bound of ( [TT] ) is arbitrar- 
ily close to the bound of ([9]). 

Although problems ( [TQ| and ( [TT] ) can be converted into 
standard SDP problems, solving them using interior-point 
methods can be very slow. Next, we show that the dual 
of ( [TT] ) has a much simpler form. 



Result 1. The dual problem of ( [TT] ) can be simplified to 
1 



max - -— ||C(u)_||jr-u' h-ar]^ 

u 4(7 



(12) 



s.t. Uj > 0, Vj = p + 1, . . . , m, 

where C(u) = Xll^i '^i^i + ^• 

Proof. The Lagrangian of the primal problem ( [TT] ) is: 

L(X, u, Z) =(X, A) - (X, Z) + ct||X||2^ - ctt?^ 

m 

+ ^^.,((X,B,)-6,), (13) 



with Z :^ and ii^- > 0, Vj = p + 1, . . . , m. Z G M^><^ is 
the dual variable w.r.t. the constraint X )^ 0; u G W^ is the 
dual variable w.r.t. the constraints ( [6b] ), ( [6c] ). 

Since the primal problem ( [TT] ) is convex, and both the 
primal and dual problems are feasible, strong duality holds. 
The primal optimal X"^ is a minimizer of L(X, u^, Z"^), i.e., 
Vx=x-L(X, u*, Z^) = 0. Then we have 

1 "^ 1 

X*= (Z*-A-^<B,) = -(Z*-C(u*)). (14) 



2(7 



i=l 



2a' 



By substituting X"^ in the Lagrangian ( [T3] ), we obtain the 
dual problem: 

max - -^||Z-C(u)|||.-u^b-a7?^ (15) 

u,Z 4(7 

s.t. Z :^ 0, Uj >0,yj =p^l,...,m. 

As the dual ( [TS] ) is still a SDP problem, it seems that no 
efficient method can be used to solve ( [T5] ) directly, other 
than the interior-point algorithms. 

Fortunately, the p.s.d. matrix variable Z can be elimi- 
nated. Given a fixed u, the dual ( [T5] ) can be simplified to: 



min ||Z-C(u)|||,, s.t. Z ^ 0. 



(16) 



Based on ([2]), the problem ( [T6] ) has an explicit solution: 
Z = C(u) + . By substituting Z to ( [TS] ), the dual problem is 
simplified to ( [T2] ). D 



We can see that the simpHfied dual problem ( p^ is not a 
SDP problem. The number of dual variables is m, i.e., the 
number of constraints in the primal problem ( pT^ . In most of 
cases, m <^in? where in? is the number of primal variables, 
and so the problem size of the dual is much smaller than 
that of the primal. 

The gradient of the objective function of ([12]) can be cal- 
culated as 

g{u,) = -^ (C(u)_, B,) - 6„ Vi = 1, . . . , m. (17) 
Za 

Moreover, the objective function of ([12]) is differentiable but 
not necessarily twice differentiable, which can be inferred 
on the results in Sect. 5 in |T| . 

Based on the following relationship: 

X* = ^(C(u*)+-C(u*)) = -^C(u*)_, (18) 

the primal optimal X"^ can be calculated from the dual opti- 
mal u* . 

Implementation We have used L-BFGS-B |26| for the 
optimization of ( [T2j ). All code is written in MATLAB (with 
mex files) and the results are tested on a 2.7GHz Intel CPU. 

The convergence tolerance settings of L-BFGS-B is set 
to the default, and the number of limited-memory vectors is 
set to 200. Because we need to calculate the value and gra- 
dient of the dual objective function at each gradient-descent 
step, a partial eigen-decomposition should be performed to 
compute C(u)_ at each iteration; this is the most computa- 
tionally expensive part. The default ARPACK embedded in 
MATLAB is used to calculate the eigenvectors smaller than 
0. Based on the above analysis, a small a will improve the 
solution accuracy; but we find that the optimization prob- 
lem becomes ill-posed for an extremely small <j, and more 
iterations are needed for convergence. In our experiments, 
a is set within the range of [10~^, 10~^]. 

There are several techniques to speed up the eigen- 
decomposition process for SDCut: (1) In many cases, the 
matrix C(u) is sparse or structural, which leads to an effi- 
cient way for calculating Cx for an arbitrary vector x. Fur- 
thermore, because ARPACK only needs a callback function 
for the matrix- vector multiplication, the process of eigen- 
decomposition can be very fast for matrices with specific 
structures. (2) As the step size of gradient-descent, ||Au||i, 
becomes significantly small after some initial iterations, the 
difference ||C(u) — C(u+Au)||i turns to be small as well. 
Therefore, the eigenspace of the current C is a good choice 
of the starting point for the next eigen-decomposition pro- 
cess. A suitable starting point can accelerate convergence 
considerably. 

After solving the dual using L-BFGS-B, the optimal 
primal X^ is calculated from the dual optimal u^ based 
on([T8l). 



Finally, the optimal variable X"*" should be discretized to 
the feasible binary solution x^. The discretization method is 
dependent on specific applications, which will be discussed 
separately in the section of applications. 

In summary, the SDCut is solved by the following steps. 
Step 1: Solve the dual problem ^ using L-BFGS-B, 
based on the application- specific A, B, b and the a cho- 
sen by the user. The gradient of the objective function is 
calculated through ( [TT] ). The optimal dual variable u* is 
obtained when the dual ([12]) is solved. 
Step 2: Compute the optimal primal variable X^ using ( [TS] ). 
Step 3: Discretize X* to a feasible binary solution x*. 

Computational Complexity The complexity for eigen- 
decomposition is 0{n^) where n is the number of rows of 
matrix A, therefore our method is 0{kn^) where k is the 
number of gradient-descent steps of L-BFGS-B. k can be 
considered as a constant, which is irrelevant with the ma- 
trix size in our experiments. Spectral methods also need 
the computation of the eigenvectors of the same matrix A, 
which means they have the same order of complexity with 
SDCut. As the complexity of interior-point SDP solvers is 
0{n^-^), our method is much faster than the conventional 
SDP method. 

Our method can be further accelerated by using faster 
eigen-decomposition method: a problem that has been stud- 
ied in depth for a long time. Efficient algorithms and well 
implemented toolboxes have been available recently. By 
taking advantage of them, SDCut can be applied to even 
larger problems. 

4. Applications 

In this section, we show several applications of SDCut 
in computer vision. Because SDCut can handle different 
types of constraints (equality/inequality, linear/quadratic), 
it can be applied to more problems than spectral methods. 

4.1. Application 1: Graph Bisection 

Formulation Graph bisection is a problem of separat- 
ing the vertices of a weighted graph into two disjoint sets 
with equal cardinality, and minimize the total weights of 
cut edges. The problem can be formulated as: 



min X Lx, s.t. x e 

xG{-l, + l}- 



0, 



(19) 



where L = D — W is the graph Laplacian matrix, W is 
the weighted affinity matrix, and D = diag(We) is the 
degree matrix. The classic spectral clustering approaches, 
e.g., RatioCut and NCut 1201 . are in the following forms: 

RatioCut: min x^Lx, s.t. x^e = 0, llxllo = n, (20) 
NCut: min x^Lx, s.t. x^c = 0, ||x||2 = n, (21) 




Figure 2: The convergence of the objective 
value of the dual |T2] ), which can be seen 
as a lower bound. SDCut is tested to bi- 
sect a random graph with 200 vertices and 
0.5 density. The bound is better when a is 
smaller. 




Figure 3: Computation time for graph bisection. All the results are the average of 5 
random graphs. Left: Comparison of SDCut, SeduMi and SDPT3. Right: Comparison 
of SDCut under different edge densities, a is set to 10 ~^ in this case. SDCut is much 
more faster than the conventional SDP methods, and is faster when the graph is sparse. 
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Figure 1: Results of 2d points bisection. The thresholds are set to 
the median of score vectors. The two classes of points are shown 
in red '+' and blue 'o'. RatioCut and NCut fail to separate the 
points correctly, while SDCut succeeds. 



where L = D-^/^LD-^/^ and c = D^/^e. The solutions 
of RatioCut and NCut are the second least eigenvectors of 
L and L, respectively. For ([19]), X = xx^ satisfies: 



diag(X) = e, and (X, ee^) = 0. 
Since x^Dx is constant for x G { — 1, 1}"^, we have 



(22) 



mm 

xG{-l,l}^ 



x^Lx 



min x^(-W)x. (23) 

xG{-l,l}'^ 



By substituting — W and the constraints ( |22| ) into ([6]) 
and ( pT^ , we then have the formulation of the conventional 
SDP method and SDCut. 

To obtain the discrete result from the solution X^, we 
adopt the randomized rounding method in f4l : a score vec- 
tor xj is generated from a Gaussian distribution with mean 
and covariance X^, and the discrete vector x^ G { — 1,1}^ 
is obtained by thresholding xj! with its median. This pro- 
cess is repeated several times and the final solution is the 
one with the highest objective value. 



(T 


bound 


obj 


norm 


rank 


iters 


10-^ 


-39.04 


-20.55 


55.05 


18 


59 


5x 10-2 


-29.91 


-20.92 


63.80 


14 


64 


10-2 


-22.93 


-21.26 


81.32 


9 


79 


10-3 


-21.45 


-21.29 


87.91 


7 


150 


10-4 


-21.31 


-21.31 


88.68 


7 


356 



Table 1: Effect of a. The lower bound, objective value 
(X"^ , — W) , norm and rank of X"*" and iterations are shown in each 
column. The number of variables is 19900 for SDP problems. The 
results correspond to Fig.|2] Better solution quality and more iter- 
ations are achieved when a becomes small. 



Experiments To show the new SDP formulation has bet- 
ter solution quality than spectral relaxation, we compare the 
bisection results of RatioCut, NCut and SDCut on two ar- 
tificial 2-dimensional data. As shown in Fig. [T] the data in 
the first row contain two point sets with different densities, 
and the second data contain an outlier. The similarity matrix 
W is calculated based on the Euclidean distance of points i 
and j: 



W. 



exp(- 
0, 



-d(z,j)V7' 



ifd(i,j) <r 
otherwise. 



(24) 



The parameter 7 is set to 0.1 of the maximum distance. Ra- 
tioCut and NCut fail to offer satisfactory results on both of 
the data sets, possibly due to the loose bound of spectral re- 
laxation. Our SDCut achieves better results on these data 
sets. 

Moreover, to demonstrate the impact of the parameter a, 
we test SDCut on a random graph with different a's. The 
graph has 200 vertices and its edge density is 0.5: 50% of 
edges are assigned with a weight uniformly sampled from 
[0, 1], the other half has zero- weights. In Fig. [2J we show 
the convergence of the objective value of the dual ([12]), i.e. 
a lower bound of the objective value of the problem ([6]). A 
smaller cr leads to a higher (better) bound. The optimal ob- 
jective value of the conventional SDP method is —21.29. 



For (7 = 10""^, the bound of SDCut (-21.31) is very close 
to the SDP optmial. Table [T] also shows the objective value, 
the Frobenius norm and the rank of solution X^. With the 
decrease of a, the quality of the solution X is further opti- 
mized (the objective value is smaller and the rank is lower). 
However, the price of higher quality is the slow convergence 
speed: more iterations are needed for a smaller a. 

Finally, experiments are performed to compare the com- 
putation time under different conditions. All the times 
shown in Fig. |3| are the mean of 5 random graphs when 
a is set to 10"^. SDCut, SeDuMi and SDPT3 are com- 
pared with graph sizes ranging from 600 to 2000 vertices. 
Our method is faster than SeDuMi and SDPT3 on all graph 
sizes. When the problem size is larger, the speedup is more 
significant. For graphs with 2000 vertices, SDCut runs 11.5 
times faster than SDPT3 and 17.0 times faster than Se- 
DuMi. The computation time of SDCut is also tested under 
0.2, 0.5 and 0.8 edge density. Our method runs faster for 
smaller edge densities, which validates that our method can 
take the advantage of graph sparsity. 

We also test the memory usage of MATLAB for SDCut, 
SeDuMi and SDPT3. Because L-BFGS-B and ARPACK 
use limited memory, the total memory used by our method 
is also relatively small. Given a graph with 1000 ver- 
tices, SDCut requires 100MB memory, while SeDuMi and 
SDPT3 use around 700MB. 

4.2. Application 2: Image Segmentation 

Formulation In graph based segmentation, images are 
represented by weighted graphs G{V^E), with vertices cor- 
responding to pixels and edges encoding feature similarities 
between pixel pairs. A partition xg{ — l,l}^is optimized 
to cut the minimal edge weights and results into two bal- 
anced disjoint groups. Prior knowledge can be introduced 
to improve performance, encoding by labelled vertices of a 
graph, i.e., pixels/superpixels in an image. As shown in the 
top line of Fig.|4] 10 foreground pixels and 10 background 
pixels are annotated by red and blue markers respectively. 
Pixels should be grouped together if they have the same 
color; otherwise they should be separated. 

Biased normalized cut (BNCut) (lAJ is an extension of 
NCut 1 20 1, which considers the partial group information 
of labelled foreground pixels. Prior knowledge is encoded 
as a quadratic constraint on x. The result of BNCut is a 
weighted combination of the eigenvectors of normalized 
Laplacian matrix. One disadvantage of BNCut is that at 
most one quadratic constraint can be incorporated into its 
formulation. Furthermore, no explicit results can be ob- 
tained: the weights of eigenvectors must be tuned by the 
user. In our experiments, we use the parameters suggested 
ind. 

Unlike BNCut, SDCut can incorporate multi- 
ple quadratic constraints on x. In our method, the 



Methods 



Time(s) 
obj 



BNCut 



SDCut 



SeDuMi 



SDPT3 



0.258 
-112.55 



23.7 
-116.10 



372 
-116.30 



329 
-116.32 



Table 2: Results on image segmentation, which are the mean of 
results of images in Fig.|4] SDCut has similar objective value with 
SeDuMi and SDPT3. a is set to 10"^ obj = (x*x*^, -W). 



partial group constraints of x are formulated as: 



(t}Px)^ 



> 



^llt}pll?, 



(t;Px)^ 



> 



^lltlpll? 



and 



{{if - t^rPx)^ > ^\\{tf - t5rP||?, where /^ G [0,1] . 
t/, tfo G {0, 1}^ are the indicator vectors of foreground and 
background pixels. P = D~^ W is the normalized affinity 
matrix, which smoothes the partial group constraints [i25il . 
After lifting, the partial group constraints are: 

(Ptjt^P,X)>^||t}P||?, (25a) 

(Pt5tlP,X)>A.||tlP||?, (25b) 

(P(tj - t,){tf - t,yP, X) > mtf - t5)^P||?. (25c) 

We have the formulations of the standard SDP and SDCut, 
with constraints ([22]) and ( [25] ) for this particular application. 
The standard SDP ^ is solved by SeDuMi and SDPT3. 

Note that constraint ( [22] ) enforces the equal partition; af- 
ter rounding, this equal partition may only be partially sat- 
isfied, though. We still use the method in |4| to generate a 
score vector, and the threshold is set to instead of median. 

Experiments We test our segmentation method on the 
Berkeley segmentation dataset |16|. Images are converted 
to Lab color space and over-segmented into SLIC superpix- 
els using the VLFeat toolbox 1241 . The affinity matrix W 
is constructed based on the color similarities and spatial ad- 
jacencies between superpixels: 






exp(- 







^2 



) 



ifd(i,j)<r, 
otherwise. 



(26) 



where f^ and fj are color histograms of superpixels i, j, and 
d(z, j) is the spatial distance between superpixels i, j. 

From Fig. [4] we can see that BNCut did not accurately 
extract foreground, because it cannot use the information 
about which pixels cannot be grouped together: BNCut 
only uses the information provided by red markers. In con- 
trast, our method clearly extracts the foreground. We omit 
the segmentation results of SeDuMi and SDPT3, since they 
are similar with the one using SDCut. In Table [2] we com- 
pare the CPU time and the objective value of BNCut, SD- 
Cut, SeDuMi and SDPT3. The results are the average of the 
five images shown in Fig. [4] In this example, a is set to 10 ~^ 
for SDCut. All the five images are over- segmented into 760 
superpixels, and so the numbers of variables for SDP are 
the same (289180). We can see that BNCut is much faster 
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Figure 4: Segmentation results on the Berkeley dataset. The top row shows the original images with partial labelled pixels. Our method 
(bottom) achieves better results than BNCut (middle). 



than SDP based methods, but with higher (worse) objective 
values. SDCut achieves the similar objective value with Se- 
DuMi and SDPT3, and is over 10 times faster than them. 

4.3. Application 3: Image Co-segmentation 

Formulation Image co-segmentation performs partition 
on multiple images simultaneously. The advantage of co- 
segmentation over traditional single image segmentation is 
that it can recognize the common object over multiple im- 
ages. Co- segmentation is conducted by optimizing two cri- 
teria: 1) the color and spatial consistency within a single im- 
age. 2) the separability of foreground and background over 
multiple images, measured by discriminative features, such 
as SIFT. Joulin et al. \1\ adopted a discriminative cluster- 
ing method to the problem of co- segmentation, and used a 
low-rank factorization method 1 8 1 (denoted by LowRank) to 
solve the associated SDP program. The LowRank method 
finds a locally-optimal factorization X = YY^ , where the 
columns of Y is incremented until a certain condition is 
met. The formulation of discriminative clustering for co- 
segmentation can be expressed as: 



min (xx\A),s.t. (^ di^ <\^ 

xG{-l,l}- 



,Vz = l,...,^, (27) 



where q is the number of images and n = X]?=i ^i i^ total 
number of pixels. Matrix A = A5 + (/i/n)A^, and A^ = 
I^ — D~^/^ WD~^/^ is the intra-image affinity matrix, and 
A6A/c(I-ene'^/n)(nA/eIn+K)"^(I-ene"^/n) is the inter- 
image discriminative clustering cost matrix. W is a block- 
diagonal matrix, whose ith block is the affinity matrix ([26]) 
of the ith image, and D = diag(Wen). K is a kernel 



matrix, which is based on the x^ —distance of SIFT features: 
K,™ = exp(- ELi((4 - ^7)y« + ^7)))- Because 
there are multiple quadratic constraints, spectral methods 
are not applicable to problem ([27]). 
The constraints for X 



xx^ are: 



diag(X) = e, (X, SiSj) < A^ Vi = 1, . . . , ^. (28) 

We then introduce A and the constraints ( [28] ) into ([6]) 
and (U) to get the associated SDP formulation. 

The strategy in LowRank is employed to recover a score 
vector xj: from the solution X"*", which is based on the 
eigen-decomposition of X"*". The final binary solution x^ 
is obtained by thresholding xj (comparing with 0). 

Experiments The Weizman horse^ and MSRCJ^ 
datasets are used for this image co-segmentation. There are 
6 ^ 10 images in each of four classes, namely car- front, 
car-back, face and horse. Each image is oversegmented to 
400 - 700 SLIC superpixels using VLFeat El. The num- 
ber of superpixels for each image class is then increased to 
4000-7000. 

Standard toolboxes like SeDuMi and SDPT3 cannot han- 
dle such large- size problems on a standard desktop. We 
compare SDCut with the LowRank approach. In this ex- 
periment, cr is set to 10~^ for SDCut. As we can see in 
Table [3] the speed of SDCut is about 5.7 times faster than 
LowRank on average. The objective values (to be mini- 
mized) of SDCut are lower than LowRank for all the four 
image classes. Furthermore, the solution of SDCut also has 
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^^_ 4.4. Application 4: Image Registration 



Formulation In image registration, K source points 
must be matched to L target points, where K < L. The 
matching should maximize the local feature similarities of 
matched-pairs and also the structure similarity between the 
source and target graphs. The problem is expressed as a 
BQP, asind: 

min h^x + ax^Hx, (29a) 

xG{0,l}^^ 

s.t. 5:^.x,, =l,Vz = l,...,i^, (29b) 

E,x,, <l,Vj = l,...,L, (29c) 

where x^j = X(^_i)x,+j = 1 if the source point i is matched 
to the target point j; otherwise 0. h G M^^ records the 
local feature similarity between each pair of source-target 
points; Hij^ki = ex.-p{ — {dij — dki)'^ /cr'^) encodes the struc- 
tural consistency of source points i, j and target points /c, /. 
By adding one row and one column to H and X = xx^ , 
we have: H = [0, 0.5h^;0.5h, aH], X = [1, x^;x, X]. 
Schellewald et al. 1. 18 J formulate the constraints for X as: 



Figure 5: Co- segmentation results on Weizman horses and MSRC 
datasets. The original images, the results of LowRank and SDCut 
are illustrated from top to bottom. LowRank and SDCut produce 
similar results. 



Dataset 


horse 


face 


car-back 


car-front 


#Images 


10 


10 


6 


6 


#VarsofBQPs(|27) 


4587 


6684 


4012 


4017 


Time(s) 


LowRank 
SDCut 


1724 
430.3 


3587 
507.0 


2456 
251.1 


2534 
1290 


obj 


LowRank 
SDCut 


-4.90 
-5.24 


-4.55 
-4.94 


-4.19 
-4.53 


-4.15 
-4.27 


rank 


LowRank 
SDCut 


17 
3 


16 
3 


13 
3 


11 
3 



Table 3: Performance comparison of LowRank |8| and SDCut 
for CO- segmentation. SDCut achieves faster speeds and better 
solution quality than LowRank, on all the four datasets. obj 

= (x'^x'^^, A).crissetto 10~^. 



lower rank than that of LowRank for each class. For car- 
back, the largest eigenvalue of the solution for SDCut has 
81% of total energy while the one for LowRank only has 

56%. 

Fig. [5] visualizes the score vector xj! on some sample 
images. The common objects (cars, faces and horses) are 
identified by our co- segmentation method. SDCut and 
LowRank achieve visually similar results in the experi- 
ments. 



Xii = 1, 
2 . diag(X) = 
N • diag(X) 
MoX = 0, 



Xl4 
= ex, 



(30a) 
(30b) 
(30c) 
(30d) 



eT and M = I 



K 



(eLe 



1/ 



Constraint pObl) arises from the 



where N = I^ 

(e^e^ - Ik) ^ 

fact that Xi = xf ; constraint ( |30c| ) arises from ( |29b| ); con- 
straint ( |30d| ) avoids undesirable solutions that match one 
point to multiple points. The SDP formulations are obtained 
by introducing into ([6]) and ( pT^ the matrix H and the con- 
straints ( |30a| ) to ( |30d| ). In this case, the BQP is a {0, 1}- 
problem, instead of { — 1, 1} -problem. Based on ( |29b| ), 
T] = trace (X) = K-\-l. The binary solution x"^ is obtained 
by solving the linear program: 



max x^diag(X^ 



S.t. x> 0, ([29bl), (|29^ 



(31) 



which is guaranteed to have integer solutions 1 18|. 

Experiments We apply our registration formulation on 
some toy data and real- world data. For toy data, we firstly 
generate 30 target points from a uniform distribution, and 
randomly select 15 source points. The source points are ro- 
tated and translated by a random similarity transformation 
y = Rx + t with additive Gaussian noise. For the Stan- 
ford bunny data, 50 points are randomly sampled and simi- 
lar transformation and noise are applied, a is set to 10 ~^. 

From Fig.|6] we can see that the source and target points 
are matched correctly. For the toy data, our method runs 
over 170 times and 50 times faster than SeDuMi and SDPT3 
respectively. For the bunny data with 3126250 variables. 



source points 



target points 



>y 



matching results 



Data 


2d-toy 3d-toy bunny 


# variables in BQP (|29) 


450 450 2500 


Time(s) 


SDCut 

SeDuMi 

SDPT3 


16.1 19.0 412 
2828 3259 > 10000 
969 981 > 10000 



Figure 6: Registration results. For 2d (top row) and 3d (middle 
row) artificial data, 15 source points are matched to a subset of 30 
target points. For bunny data (bottom row), there are 50 source 
points and 50 target points, a is set to 10 ~^. 



SDCut spends 412 seconds and SeDuMi/SDPT3 did not 
find solutions after 3 hours running. The improvements on 
speed for SDCut is more significant than previous experi- 
ments. The reason is that the SDP formulation for registra- 
tion has much more constraints, which slows down SeDuMi 
and SDPT3 but has much less impact on SDCut. 

5. Conclusion 

In this paper, we have presented an efficient semidefinite 
formulation (SDCut) for BQPs. SDCut produces a similar 
lower bound with the conventional SDP formulation, and 
therefore is tighter than spectral relaxation. Our formula- 
tion is easy to implement by using the L-BFGS-B toolbox 
and standard eigen-decomposition software, and therefore 
is much more scalable than the conventional SDP formu- 
lation. We have applied SDCut to a few computer vision 
problems, which demonstrates its flexibility in formulation. 
Experiments also show the computational efficiency and 
good solution quality of SDCut. 

We have made the code available onlinq^ 
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